Week 7.1 - Natural Language to Code

What We'll Cover

Something genuinely revolutionary has happened in the past two years. Researchers with no programming background can now describe an analysis in plain English and receive working Python, R, or MATLAB code in return. You can say "load this CSV file, calculate summary statistics for each column, and plot a histogram of the age variable" — and get a script that actually runs.

This is not a gimmick. It is a fundamental change in who can do computational research. Tasks that previously required months of programming courses, or hiring a programmer, or spending days on Stack Overflow, can now be accomplished in minutes through conversation. For many of you in this room, this is the single most practical thing you will learn in this course.

But — and this is the critical caveat — you do not need to become a programmer, but you do need to understand what the code is doing. Modern agentic tools like Claude Code are remarkably good at catching and fixing their own syntax errors through autonomous iteration — they run the code, see what failed, and fix it without you lifting a finger. What they cannot reliably catch is code that runs perfectly but uses the wrong statistical method, the wrong variable, or the wrong analytical logic for your research question. This session will orient you to the tools, the quality gap between free and paid versions, the emerging practice of "vibe coding," and the minimum level of scientific literacy you need to use these tools responsibly.

🚀 The Paradigm Shift

For decades, the barrier between having a research question and being able to answer it computationally was programming skill . That barrier has not disappeared, but it has been dramatically lowered.

What Has Changed

Until recently, if you wanted to analyse a dataset, you had two options: learn to program, or find someone who could program to do it for you. Both options introduced delays, dependencies, and friction. Many valuable research questions went unanswered simply because the researcher asking them did not have the technical skills to answer them.

Large language models have created a third option: describe what you want in natural language, and let the AI generate the code. This works because modern LLMs have been trained on millions of code repositories and documentation pages. They have seen essentially every common programming pattern, library call, and data analysis workflow that exists.

What This Means for You

If you have no coding background: This is genuinely democratising. Tasks that were previously impossible for you — loading datasets, running statistical tests, creating visualisations, cleaning messy data — are now accessible through conversation. You are not pretending to be a programmer. You are using a new kind of interface to achieve the same results.

If you already code: This changes your workflow but not your need for expertise. You will write code faster, explore approaches more quickly, and spend less time on boilerplate. But your understanding of algorithms, data structures, and domain-specific methods remains essential — the AI accelerates your work; it does not replace your judgment.

🧪 A Concrete Example

Imagine you have a CSV file with survey data from 500 respondents. You want to know whether responses differ significantly between two groups. Here is what you might type into an AI coding tool:

"Load the file survey_results.csv. Compare the mean scores in the 'satisfaction' column between participants where the 'group' column equals 'control' and where it equals 'intervention'. Run an independent samples t-test and report the t-statistic, p-value, and effect size. Then create a box plot comparing the two groups."

A modern AI coding tool will generate working Python or R code that does exactly this — including loading the library dependencies, handling the file import, running the statistical test, and producing the visualisation. The entire process takes less than a minute.

The key insight: you described what you wanted, not how to implement it. The AI translated your intent into specific technical instructions. This is what we mean by natural language as an interface to code.

💡 The Critical Caveat

Modern agentic tools like Claude Code are excellent at catching their own programming errors — they iterate autonomously, fix syntax issues, and can even inspect their own visualisation outputs. This is a genuine capability leap from earlier AI coding tools. But there is a category of error these tools do not reliably catch: code that runs perfectly but uses the wrong statistical test, the wrong variable, or the wrong analytical logic for your research question.

A t-test applied to ordinal Likert data will run without errors and give you a p-value. That p-value may be wrong. No amount of autonomous iteration catches this — because from the code's perspective, nothing failed. You do not need to write code, but you absolutely need the scientific understanding to judge whether the code is doing the right analysis. We will return to this in Section 5.

🛠️ The Tools Landscape

The ecosystem of AI coding tools has expanded rapidly. Each tool has a different approach, different strengths, and a different trade-off between ease of use and capability. Here is what matters for researchers.

Claude Code

What it is: A terminal-based, agentic coding tool from Anthropic. It runs in your computer's command line and can read and write files on your system, execute code, and iterate on results — all through natural language conversation.

Key strengths: Strongest reasoning capabilities of any current coding tool. Can work with your actual files and project structure. Agentic mode means it can plan multi-step tasks, run code, check results, and fix errors autonomously.

Limitations: Requires comfort with the terminal (command line). The free tier is limited in usage. Steeper learning curve than chat-based alternatives.

Pricing: Free tier limited. Pro tier $20/month.

ChatGPT Code Interpreter

What it is: OpenAI's built-in Python execution environment inside ChatGPT. You can upload data files directly, and it will write and run Python code in a sandboxed environment, returning results and visualisations in the chat.

Key strengths: Extremely easy to use — no setup required. Upload a CSV and start asking questions. Generates visualisations inline. Good for exploratory analysis and quick data tasks.

Limitations: Python only (no R or MATLAB). The sandbox environment has limited libraries. Cannot access your local files or other systems. Session state can be lost.

Pricing: Free tier available. Plus $20/month.

GitHub Copilot

What it is: An AI autocomplete tool that integrates directly into code editors like VS Code and JetBrains. As you type code, it suggests completions — from single lines to entire functions — based on the context of what you are writing.

Key strengths: Seamless integration into the coding workflow. Excellent for writing code faster if you already know how to code. Supports virtually every programming language. The inline suggestion model is very natural once you get used to it.

Limitations: Requires a code editor and some coding knowledge to use effectively. It completes code — it does not have a conversation with you about your analysis goals. Less useful for complete beginners.

Pricing: Free tier for students and educators. Pro $10/month.

Cursor

What it is: An AI-first code editor built on VS Code. It combines the familiar editing experience with deep AI integration: you can chat with your codebase, make multi-file edits through natural language, and run up to eight parallel AI agents on different tasks.

Key strengths: The most powerful AI integration of any code editor. Multi-file awareness means it understands your entire project. Natural language commands for editing and refactoring. Parallel agents for complex tasks.

Limitations: Still a code editor — more approachable than a raw terminal, but still oriented toward people who work with code regularly. The free tier is quite limited.

Pricing: Free tier limited. Pro $20/month.

Google Colab AI

What it is: AI assistance built directly into Google Colaboratory, a free cloud-based Jupyter notebook environment. You can write natural language prompts in code cells and get AI-generated code suggestions.

Key strengths: Free with any Google account. Runs in the browser — no installation needed. Access to free GPU/TPU resources for machine learning tasks. Familiar notebook interface for iterative analysis. Easy to share and collaborate.

Limitations: AI features are less sophisticated than dedicated tools. Python-only. Session timeouts can lose your work. The free tier has limited compute resources.

Pricing: Free with Google account. Colab Pro from $10/month for more resources.

Gemini Code Assist

What it is: Google's AI coding assistant, integrated into various IDEs including VS Code and JetBrains, as well as Google Cloud tools. Uses Google's Gemini models for code generation, completion, and explanation.

Key strengths: Strong integration with the Google Cloud ecosystem. Good at working with Google-specific tools and services. Supports multiple languages. Codebase-aware for context-appropriate suggestions.

Limitations: Less established than Copilot in terms of community adoption. Some features are tied to Google Cloud Platform. Quality can be variable depending on the language and framework.

Pricing: Free tier available. Enterprise pricing varies.

📋 Which Tool Should You Start With?

If you have never written code before: Start with ChatGPT Code Interpreter or Google Colab AI. Both let you upload data and start asking questions immediately, with zero setup. You will hit limitations eventually, but they are the gentlest entry point.

If you have some coding experience: Try Cursor or GitHub Copilot. Both integrate into the coding workflow you already know and will accelerate it significantly.

If you want maximum power and are willing to learn: Claude Code offers the strongest reasoning and most autonomous capabilities, but requires comfort with command-line tools.

💰 Free vs Paid — The Quality Gap

This section matters. Many of you will be using free tiers of these tools, and you need an honest assessment of what free versions can and cannot do. The quality gap in AI coding tools is real, and pretending otherwise would be a disservice.

Where the Differences Lie

The gap between free and paid AI coding tools is not just about speed or convenience. It affects the quality of the code you receive and, by extension, the reliability of your research analysis. Here are the key dimensions:

Context window: Free tiers typically offer much smaller context windows — the amount of information the AI can consider at once. This matters enormously when working with data. If the AI cannot see your entire dataset structure, your full script, or the error messages alongside your code, it will produce less accurate and less relevant suggestions. Paid tiers may offer 100,000+ tokens of context versus 8,000–32,000 on free tiers.
Model quality: Free tiers often route you to older or smaller models. The difference in code quality is substantial: better models write more idiomatic code, catch edge cases, choose appropriate libraries, and understand complex multi-step requests. They are also significantly better at debugging — understanding why code fails, not just what to try next.
Rate limits: Free tiers restrict how many requests you can make per hour or per day. During an intensive analysis session — where you are iterating on code, fixing bugs, and refining visualisations — you can easily hit limits that force you to stop working or wait. This breaks your analytical flow at exactly the moment when momentum matters most.
File access and code execution: Some free tiers cannot read your files or run code at all. You are reduced to copy-pasting snippets back and forth, which introduces errors, loses context, and makes multi-step analysis painfully slow. Paid tiers often include direct file access and execution environments.

Where Free Is Sufficient

Free tiers are genuinely useful for many common research tasks, and you should not feel that paid access is required to benefit from AI coding assistance:

Simple code generation: Writing a function to calculate a specific metric, converting between data formats, or generating boilerplate code
Learning syntax: Understanding how a particular library works, what arguments a function takes, or how to accomplish a basic task in a language you are learning
Basic data exploration: Loading a dataset, computing summary statistics, creating simple visualisations — especially through tools like Colab AI that provide free execution environments
Getting started with a new library: Understanding the basic patterns of pandas, ggplot2, matplotlib, or any other common data analysis library
Understanding error messages: Pasting an error message and getting a clear explanation of what went wrong and how to fix it — often the single most valuable use of AI for beginners

Where Paid Makes a Real Difference

The gap becomes apparent in more demanding scenarios, particularly those common in research:

Complex multi-step analysis: When your analysis requires loading data, cleaning it, merging datasets, running statistical models, and generating publication-quality figures — all as a coherent workflow — paid tools handle this dramatically better
Debugging subtle errors: When code runs but produces wrong results, identifying the problem requires deep reasoning about data flow and logic. Better models are substantially more capable here
Working with large datasets: Larger context windows mean the AI can consider more of your data structure, variable names, and relationships simultaneously
Iterative refinement: Research analysis is rarely one-shot. You refine, adjust, and explore. Paid tiers support the sustained back-and-forth conversation this requires
Production-quality code: Code that needs to be reproducible, well-documented, and robust enough for publication benefits significantly from better models

💡 Strategies for Maximising Free Tier Value

If you are working within free tiers — which is perfectly reasonable — these strategies will help you get the most out of them:

Break tasks into small pieces: Instead of asking for an entire analysis pipeline, ask for one step at a time. This works within smaller context windows and produces more reliable code
Provide maximum context in your prompts: Tell the AI about your data structure, variable types, what you have already tried, and what you expect the output to look like. The more context you provide upfront, the less the AI needs to guess
Use different free tools for different strengths: Google Colab for free code execution, Claude free tier for reasoning through complex problems, GitHub Copilot's free tier for autocomplete while coding. No single free tool does everything well, but the combination covers a lot of ground
Save your prompts: When a prompt produces good results, save it. Building a personal library of effective prompts is like building a toolkit — it compounds over time and reduces the number of requests you need to make

⚠️ The Honest Assessment

The quality gap between free and paid AI coding tools is larger than in any other category we have covered in this course. If your research depends heavily on computational analysis, the investment in a paid tier may be one of the most cost-effective you can make — a single month's subscription costs less than a textbook and can save dozens of hours. But you should know exactly what you are paying for and what the free alternatives can still do. Do not let marketing convince you that free tools are useless, and do not let optimism convince you that free tools are equivalent to paid ones. The truth is in between, and where you fall depends on the complexity of your work.

Tool	Free Tier	Paid Tier	Best For
Claude Code	Limited usage	$20/mo (Pro)	Complex reasoning, multi-file projects, agentic workflows
ChatGPT Code Interpreter	Basic access available	$20/mo (Plus)	Quick data exploration, beginners, visual output
GitHub Copilot	Free for students/educators	$10/mo (Pro)	Inline code completion, working within an editor
Cursor	Limited completions	$20/mo (Pro)	Multi-file editing, codebase-wide changes, parallel agents
Google Colab AI	Free with Google account	$10/mo (Colab Pro)	Notebook-based analysis, free GPU access, collaboration
Gemini Code Assist	Free tier available	Enterprise pricing	Google Cloud integration, multi-language support

🎵 "Vibe Coding" — Promise and Peril

A term emerged in early 2025 for this new way of writing code by describing what you want rather than specifying how to implement it. It captures both the appeal and the danger of the approach.

What Is Vibe Coding?

The term "vibe coding" was coined in 2025 to describe a style of programming where you guide AI through natural language descriptions, accepting or modifying what it produces based on whether the output "feels right" rather than verifying every line. You describe the vibe of what you want, and the AI fills in the technical details.

At its best, vibe coding is a powerful rapid prototyping technique. At its worst, it produces code that appears to work but answers the wrong question. The distinction between these outcomes depends almost entirely on whether you verify the scientific correctness of the analysis — not just whether the code runs.

An important caveat for 2026: Modern agentic tools have largely eliminated the old "paste error message back and forth" problem. Claude Code, Codex, and similar tools run code autonomously, read the error, fix it, and iterate — often without any human intervention. They can also view their own plot outputs and critique them. This is a genuine improvement over earlier tools. But this capability addresses programming errors . It does not address scientific errors — and those are the ones that matter most for research.

When Vibe Coding Works Well

There are legitimate, productive uses of the vibe coding approach in research:

Prototyping: When you want to quickly test whether an approach is feasible before investing time in a careful implementation. If the prototype works, you can then verify and refine the code
Exploratory analysis: When you are exploring a dataset to understand its structure, distributions, and patterns. At this stage, approximate answers are fine because you are generating hypotheses, not testing them
Learning: When you are trying to understand how a library or technique works. Having AI generate examples that you then study and modify is an effective learning strategy
Quick utility scripts: Renaming files, converting formats, reorganising data — tasks where the correctness is obvious from the output

When Vibe Coding Is Dangerous

The same approach becomes risky when the stakes are higher:

Analysis for publication: Code that produces results you will report in a paper must be verified, not vibed. A wrong statistical test or a subtle data filtering error can invalidate your findings
Anything where errors have consequences: If your analysis informs policy decisions, clinical recommendations, or resource allocation, "it looks about right" is not an acceptable standard
Complex data pipelines: When data passes through multiple transformation steps, errors can compound. Each step might look reasonable in isolation while the pipeline as a whole produces garbage
Reproducibility-critical work: If other researchers need to reproduce your analysis, vibe-coded scripts with unclear logic and no documentation will fail the reproducibility test

💡 The Parallel to AI Writing

There is a direct connection to what we discussed in Week 6. Just as AI can write prose that sounds right but says something subtly wrong, it can write code that looks right but computes something subtly wrong. In both cases, the surface quality masks potential problems underneath. And in both cases, the solution is the same: you need to understand what the output is doing well enough to catch errors, not just well enough to accept it.

The "explain this paragraph" test from Week 6 has an exact analogue in code: can you explain what each section of the AI-generated code does and why? If not, you are vibe coding, and you should not trust the output for anything that matters.

⚠️ The Remaining Problem

The vibe coding risk has shifted. In 2024, the danger was code that crashed halfway through your analysis. Today, with agentic tools handling iteration autonomously, that risk is largely gone. The danger is now more subtle: a script that runs without errors, produces a polished plot, and returns a confident p-value — while using the wrong test for your data type, silently excluding observations in a way that biases your results, or modelling the wrong relationship entirely. These are scientific errors, not programming errors, and no amount of autonomous code iteration will catch them. The only defence is understanding what the analysis is doing and whether it matches your research question.

📖 What You Need to Know Even Without Programming

You do not need to write code from scratch. But you do need to read code well enough to verify that AI-generated scripts are doing what you intend. Here is the minimum set of concepts that every AI-assisted researcher should understand.

Variables and Data Types

A variable is a named container that holds a value. When AI-generated code says mean_score = df['satisfaction'].mean() , you should understand that mean_score is storing the average of the 'satisfaction' column, and df is the dataset.

Data types matter because operations that work on one type may fail or behave unexpectedly on another. A column of numbers stored as text will not compute a correct average — and AI-generated code does not always catch this. Knowing the difference between numeric, text, and categorical data is essential.

Functions and Libraries

A function is a reusable block of code that takes input and produces output. t_test(group_a, group_b) takes two sets of values and returns a test result. You do not need to know how the t-test is implemented, but you need to know what it expects as input and what it returns.

Libraries (also called packages) are collections of pre-written functions. When code starts with import pandas as pd or library(tidyverse) , it is loading tools that other programmers have built. Understanding which libraries are standard for your type of analysis helps you judge whether the AI is making sensible choices.

Loops and Conditionals

A loop repeats an operation multiple times — for example, processing each row in a dataset or running an analysis on each subgroup. When you see for group in groups: , the code inside will execute once for each group.

A conditional executes code only when a condition is true. if p_value < 0.05: means the indented code only runs when the p-value is below the threshold. These are the basic building blocks of any analysis logic, and understanding them helps you trace what AI-generated code is actually doing step by step.

Reading Error Messages

When code fails, it produces an error message. These messages are not written for beginners, but they contain valuable information. A typical Python error tells you: which line failed, what type of error occurred, and often a description of the problem.

The most practical skill you can develop is learning to copy the error message into an AI tool and ask for an explanation . This is one area where even free-tier AI tools excel — translating cryptic error messages into plain English and suggesting fixes. You do not need to memorise error types. You need to know that the error message is the starting point for fixing the problem, not a reason to give up.

🧪 The Difference Between "Runs" and "Correct"

Consider this scenario: you ask AI to compare satisfaction scores between two groups. The AI generates code that runs perfectly and produces a p-value of 0.03. You conclude the groups are significantly different.

But look closer at the code. The AI used a standard t-test, which assumes your data is normally distributed. Your satisfaction scores are on a 1–5 Likert scale — they are ordinal, not normally distributed. A Mann-Whitney U test would have been more appropriate. The "correct" code might give you a p-value of 0.08 — not significant.

The code ran. It produced a result. That result was wrong — not because the AI made a programming error, but because it chose an inappropriate statistical method for your data type. This is exactly the kind of error that vibe coding will miss and that a basic understanding of your analysis can catch.

The lesson: Always ask yourself — is this code doing the right analysis, or just an analysis? If you are unsure, describe your data and the test the AI chose to a second AI tool and ask whether the choice is appropriate. Cross-checking between tools is one of the most effective strategies available to non-programmers.

Describe your goal clearly. Be specific about your data (types, size, structure), what question you are trying to answer, and what you expect the output to look like. Vague prompts produce vague code.
Ask the AI to explain the code it generates. Before running anything, ask "explain what each section of this code does." If the explanation does not match your intent, the code is wrong — even if it runs.
Check the statistical choices. Ask "is [test name] appropriate for my data?" Describe your variables (ordinal, continuous, categorical) and your sample size. The AI should justify its choice, not just make one.
Verify with known values. If possible, run the code on a small dataset where you know the correct answer. If your code says the mean of [2, 4, 6] is not 4, something is wrong.
Read the output critically. Do the numbers make sense? Is the range plausible? Are there obvious anomalies? Your domain expertise is your most valuable debugging tool — you know what the answer should approximately look like even before running the analysis.

📚 Readings

Three core readings for this sub-lesson. The first two are practical guides aimed at researchers using AI coding tools. The third is an empirical evaluation of how well AI performs as a data analyst.

📄 Core Reading 1

Mineault, P. (2026). "Claude Code for Scientists." — A practical guide by a neuroscientist on how Claude Code fits into a scientific workflow. Covers setup, common use cases, and the balance between AI assistance and scientific rigour. Particularly relevant for researchers who want to understand the agentic approach to AI-assisted coding.

📄 Core Reading 2

Dataquest (2025). "Getting Started with Claude Code for Data Scientists." — A step-by-step introduction to using Claude Code for data analysis tasks. Good for understanding the practical workflow: how you interact with the tool, what kinds of tasks it handles well, and where it needs guidance.

📄 Core Reading 3

Cheng, L., Li, X., & Bing, L. (2023). "Is GPT-4 a Good Data Analyst?" arXiv:2305.15038 — An empirical evaluation of GPT-4's capabilities as a data analyst, testing it on real-world data analysis tasks. The findings are nuanced: strong on standard tasks, weaker on edge cases and domain-specific judgment. Essential reading for understanding both the capabilities and limitations of AI-assisted data analysis.

Supplementary readings for those who want to go deeper:

📄 Supplementary Reading 1

Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science , 2nd Edition. — Free online. Even if you plan to use Python, this book is one of the best introductions to thinking about data analysis systematically. The concepts transfer across languages, and AI tools can help you translate R patterns into Python or vice versa.

📄 Supplementary Reading 2

Hong, S., et al. (2024). "Data Interpreter: An LLM Agent for Data Science." arXiv:2402.18679 — A research paper on building LLM-based agents specifically for data science tasks. Relevant for understanding where the field is heading: AI tools that do not just generate code but plan, execute, and iteratively refine entire analysis pipelines.

📚 Summary & What Comes Next

This session introduced the paradigm shift that AI coding tools represent for researchers: the ability to describe analysis in natural language and receive working code. We surveyed the tools available, from beginner-friendly options like ChatGPT Code Interpreter and Google Colab to power tools like Claude Code and Cursor.

We addressed the quality gap between free and paid tiers honestly — free tools are genuinely useful for many tasks, but the difference in complex analysis scenarios is substantial. We examined "vibe coding" as both a useful prototyping approach and a dangerous habit when applied to research that will be published. And we outlined the minimum code literacy that every AI-assisted researcher needs: not the ability to write code, but the ability to read it well enough to verify that it does what you intend.

Natural language to code is real and practical: You can describe what you want and get working code. This is a genuine democratisation of computational research
The tools vary significantly: Choose based on your experience level, your needs, and whether free tiers are sufficient for your work
Free vs paid matters: The gap is real, especially for complex analysis. Use strategies to maximise free tier value, and consider paid tiers as a research investment if computation is central to your work
Vibe coding has its place: For prototyping and exploration, it is useful. For publication-quality analysis, it is dangerous. Know the difference
You need code literacy, not coding skill: Reading code, understanding error messages, and verifying statistical choices are the essential skills for AI-assisted research

Next session: In Sub-Lesson 2 (AI-Assisted Data Analysis in Practice), we move from concepts to hands-on work — loading real datasets, generating analysis code, debugging errors, and producing visualisations using the tools and principles introduced today.